AITopics | fundamental frequency

Collaborating Authors

fundamental frequency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal spectral transportation with application to music transcription

Rémi Flamary, Cédric Févotte, Nicolas Courty, Valentin Emiya

Neural Information Processing SystemsMar-23-2026, 06:18:38 GMT

Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates.

artificial intelligence, frequency, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Industry:

Media > Music (0.68)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR

Leang, Sotheara, Castelli, Éric, Vaufreydaz, Dominique, Sam, Sethserey

arXiv.org Artificial IntelligenceAug-1-2025

The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.

artificial intelligence, machine learning, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2507.22964

Country:

Europe > France (0.29)
Asia (0.29)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks

Zhao, Xufang, Tsimhoni, Omer

arXiv.org Artificial IntelligenceApr-9-2025

-- Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver's emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to de tect F0 through Convolutional Neural Networks (CNN) and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates a very good detection accuracy; a total of 9 2 % of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, t he experimental comparison between our new approach and other state - of - the - art CNN methods reveals that our approach can enhance the detection rate by approximately 5% across various Signal - to - Noise Ratio (SNR) conditions . Pitch detection is very widely used for smart mobility features. For example, as shown in Fig.1, pitch contour can be used to train a deep learning neural network for driver's emotion detection, which can alert road rage.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2504.06165

Country: North America > United States > Michigan (0.14)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Transportation > Ground > Road (0.48)
Automobiles & Trucks (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks

Atif, Youness

arXiv.org Artificial IntelligenceMar-7-2025

This paper introduces a novel audio-to-image encoding framework that integrates multiple dimensions of voice characteristics into a single RGB image for speaker recognition. In this method, the green channel encodes raw audio data, the red channel embeds statistical descriptors of the voice signal (including key metrics such as median and mean values for fundamental frequency, spectral centroid, bandwidth, rolloff, zero-crossing rate, MFCCs, RMS energy, spectral flatness, spectral contrast, chroma, and harmonic-to-noise ratio), and the blue channel comprises subframes representing these features in a spatially organized format. A deep convolutional neural network trained on these composite images achieves 98% accuracy in speaker classification across two speakers, suggesting that this integrated multi-channel representation can provide a more discriminative input for voice recognition tasks.

classification, frequency, spectral centroid, (13 more...)

arXiv.org Artificial Intelligence

2503.05929

Genre: Research Report (1.00)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot Forecasting

Nochumsohn, Liran, Moshkovitz, Michal, Avner, Orly, Di Castro, Dotan, Azencot, Omri

arXiv.org Artificial IntelligenceNov-24-2024

Time series forecasting is critical in numerous real-world applications, requiring accurate predictions of future values based on observed patterns. While traditional forecasting techniques work well in in-domain scenarios with ample data, they struggle when data is scarce or not available at all, motivating the emergence of zero-shot and few-shot learning settings. Recent advancements often leverage large-scale foundation models for such tasks, but these methods require extensive data and compute resources, and their performance may be hindered by ineffective learning from the available training set. This raises a fundamental question: What factors influence effective learning from data in time series forecasting? Toward addressing this, we propose using Fourier analysis to investigate how models learn from synthetic and real-world time series data. Our findings reveal that forecasters commonly suffer from poor learning from data with multiple frequencies and poor generalization to unseen frequencies, which impedes their predictive performance. To alleviate these issues, we present a novel synthetic data generation framework, designed to enhance real data or replace it completely by creating task-specific frequency information, requiring only the sampling rate of the target data. Our approach, Freq-Synth, improves the robustness of both foundation as well as nonfoundation forecast models in zero-shot and few-shot settings, facilitating more reliable time series forecasting under limited data scenarios. Time series forecasting (TSF) plays a critical role in various areas, such as finance, healthcare, and energy, where accurate predictions of future values are essential for decision-making and planning. Traditionally, in-domain learning has been the common setting for developing forecasting models, where a model is trained using data from the same domain it will later be deployed in (Salinas et al., 2020; Zhou et al., 2021). This ensures that the model captures the patterns, seasonality, and trends specific to the target domain, improving its predictive performance. However, a significant challenge arises when there is scarce or no historical information available for training, limiting the ability to apply traditional in-domain learning approaches (Sarmas et al., 2022; Fong et al., 2020). In such cases, the emergence of zero-shot (ZS) and few-shot (FS) learning settings offer potential solutions. Zero-shot learning enables models to generalize to new, unseen domains without requiring domainspecific data by leveraging knowledge transfer from other domains or tasks.

frequency, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2411.15743

Country:

South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.92)
Health & Medicine > Therapeutic Area > Immunology (0.92)
Health & Medicine > Epidemiology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

Vrba, Jan, Steinbach, Jakub, Jirsa, Tomáš, Verde, Laura, De Fazio, Roberta, Homma, Noriyasu, Zeng, Yuwen, Ichiji, Key, Hájek, Lukáš, Sedláková, Zuzana, Mareš, Jan

arXiv.org Artificial IntelligenceOct-14-2024

In this study, we propose a robust set of features derived from a thorough research of contemporary practices in voice pathology detection. The feature set is based on the combination of acoustic handcrafted features. Additionally, we introduce pitch difference as a novel feature. We combine this feature set, containing data from the publicly available Saarbr\"ucken Voice Database (SVD), with preprocessing using the K-Means Synthetic Minority Over-Sampling Technique algorithm to address class imbalance. Moreover, we applied multiple ML models as binary classifiers. We utilized support vector machine, k-nearest neighbors, naive Bayes, decision tree, random forest and AdaBoost classifiers. To determine the best classification approach, we performed grid search on feasible hyperparameters of respective classifiers and subsections of features. Our approach has achieved the state-of-the-art performance, measured by unweighted average recall in voice pathology detection on SVD database. We intentionally omit accuracy as it is highly biased metric in case of unbalanced data compared to aforementioned metrics. The results are further enhanced by eliminating the potential overestimation of the results with repeated stratified cross-validation. This advancement demonstrates significant potential for the clinical deployment of ML methods, offering a valuable tool for an objective examination of voice pathologies. To support our claims, we provide a publicly available GitHub repository with DOI 10.5281/zenodo.13771573. Finally, we provide REFORMS checklist.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.10537

Country:

Europe > Germany > Saarland > Saarbrücken (0.14)
Europe > Czechia > Prague (0.04)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (1.00)

Add feedback

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

Lee, Jin Woo, Park, Jaehyun, Choi, Min Jun, Lee, Kyogu

arXiv.org Artificial IntelligenceJul-7-2024

While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online.

frequency, international conference, synthesis, (15 more...)

arXiv.org Artificial Intelligence

2407.05516

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > United States > Illinois (0.04)
Europe > Ireland (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

Li, Xiang, Bu, Fan, Mehrish, Ambuj, Li, Yingting, Han, Jiale, Cheng, Bo, Poria, Soujanya

arXiv.org Artificial IntelligenceMar-31-2024

Neural Text-to-Speech (TTS) systems find broad applications in voice assistants, e-learning, and audiobook creation. The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis. Yet, the efficiency of multi-step sampling in Diffusion Models presents challenges. Efforts have been made to integrate GANs with DMs, speeding up inference by approximating denoising distributions, but this introduces issues with model convergence due to adversarial training. To overcome this, we introduce CM-TTS, a novel architecture grounded in consistency models (CMs). Drawing inspiration from continuous-time diffusion models, CM-TTS achieves top-quality speech synthesis in fewer steps without adversarial training or pre-trained model dependencies. We further design weighted samplers to incorporate different sampling positions into model training with dynamic probabilities, ensuring unbiased learning throughout the entire training process. We present a real-time mel-spectrogram generation consistency model, validated through comprehensive evaluations. Experimental results underscore CM-TTS's superiority over existing single-step speech synthesis systems, representing a significant advancement in the field.

cm-tts, diffgan-tts, synthesis, (16 more...)

arXiv.org Artificial Intelligence

2404.00569

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.54)
Energy (0.46)
Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimal spectral transportation with application to music transcription

Neural Information Processing SystemsMar-12-2024, 09:46:44 GMT

Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates.

divergence, frequency, fundamental frequency, (14 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport

Torres, Bernardo, Peeters, Geoffroy, Richard, Gaël

arXiv.org Artificial IntelligenceJan-15-2024

In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals. We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer. The proposed approach offers a promising direction for improving unsupervised parameter estimation in neural audio applications.

amplitude, estimation, frequency, (15 more...)

arXiv.org Artificial Intelligence

2312.14507

Country: Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.84)

Add feedback